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Abstract — This paper develops an envelope-based approach 
to establish a link between information and queueing theory. 
Unlike classical, equilibrium information theory, information 
envelopes focus on the dynamics of sources and coders, using 
functions of time that bound the number of bits generated. 
In the limit the information envelopes converge to the average 
behavior and recover the entropy of a source, respectively, the 
average codeword length of a coder. In contrast, on short time 
scales and for sources with memory it is shown that large 
deviations from known equilibrium results occur with non- 
negligible probability. These can cause significant network delays. 
Compared to well-known traffic models from queueing theory, 
information envelopes consider the functioning of information 
sources and coders, avoiding a priori assumptions, such as 
exponential traffic, or empirical, trace-based traffic models. Using 
results from the stochastic network calculus, the envelopes yield 
a characterization of the operating points of source coders by 
the triplet of capacity, delay, and error. In the limit, assuming an 
optimal coder the required capacity approaches the entropy with 
arbitrarily small probability of error if infinitely large delays 
are permitted. We derive a corresponding characterization of 
channels and prove that the model has the desirable property of 
additivity, that allows analyzing coders and channels separately. 



I. Introduction 

Originating from the seminal works by Shannon in 1948, 
the tremendous progress in information and coding theory 
has enabled numerous ground-breaking applications that range 
from digital communications to data storage and processing. 
The fundamental results of information theory are asymptotic 
limits for the transmission of information by a source over a 
channel. Information theory defines the notion of entropy and 
channel capacity as the expected information of a source and 
the maximum expected transinformation of a channel. Coding 
theory devises practical source and channel codes for data 
compression and reliable transmission that seek to approach 
the limits established by the entropy and the channel capacity, 
respectively | ,11J . 

In networking, information theory has not become widely 
accepted, yet. A major challenge for establishing a network 
information theory is due to the properties of network data 
traffic that is highly variable and delay-sensitive | jT4| . In 
contrast, information theory mostly neglects the dynamics 
of information and capacity and focuses on averages, re- 
spectively, asymptotic limits. Typically, these limits can be 
achieved with arbitrarily small probability of error assuming, 
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however, arbitrarily long codewords and as a consequence 
arbitrarily large coding delays |3|. In networking, however, 
delay is a key performance parameter that can be traded for 
capacity or loss using results from queueing theory. Moreover, 
considering the variability of sources is essential in packet data 
networks as it potentiates significant resource savings due to 
statistical multiplexing |14|. 

The analytical cornerstone of networking is queueing theory 
that dates back to the works on the dimensioning of circuit- 
switched networks by Erlang in 1909 and 1917. In 1962 Klein- 
rock advanced the theory and proved the resource efficiency 
of packet-switching that is achieved by bursty sources due 
to resource sharing. For packet-switched networks queueing 
theory can provide exact solutions for backlogs and delays 
that occur due to the variability of packet inter-arrival and 
service times. Typically, the inter-arrival and service times 
obey a certain distribution by assumption, e.g., exponential. 
Recent approaches like the theory of effective bandwidths |^, 
|24| , deterministic network calculus |[8|, | [T2) , | [25| , and the 
stochastic network calculus |8|, |91, fT3|, |[T5), ||22), ||26| 
compute performance bounds for a wider range of stochas- 
tic processes. Despite the need, e.g., for joint coding and 
scheduling problems or for cross-layer optimization, a tight 
link between these models and information theory has not been 
estabhshed, so far ||3), |[14), ||22). 

To bridge the gap towards queuing theory, a non-equilibrium 
information theory that can model the variability and delay- 
sensitivity of real sources is required p), |14|. While (14 1 
envisions "effective bandwidth versus distortion functions," |[3 1 
proposes the idea of "throughput-delay-reliability-triplets" to 
characterize mobile ad-hoc networks. As potentially promising 
candidate theories |(3), 114) , | |22) mention effective bandwidths, 
large deviations, or the stochastic network calculus, however, 
without providing any details and conclude that unifying 
information and queueing theory remains as one of the most 
important challenges. 

In this paper we formulate a non-equilibrium theory of in- 
formation sources and source coders combining methods from 
information theory and effective bandwidths, respectively, 
the stochastic network calculus. We characterize information 
sources by envelope functions that are statistical bounds on 
the amount of information generated by the source in a time 
interval of defined width. While on short time-scales the 
envelopes can exceed the entropy considerably, they approach 
the entropy on long time-scales and converge in the limit. 
We derive such information envelopes for memoryless sources 



and develop a technique for analysis of Markov sources. We 
find that the memory of a source significantly increases the 
envelope compared to its entropy and that it leads to a slower 
convergence. Using a sample path argument for the envelopes 
we derive a notion of the achievable capacity-delay-error- 
tradeoff of a coded source. We recover known asymptotic 
results if the capacity approaches the average codeword length 
where the delay tends to infinity for any non-trivial probability 
of error. We show the capacity-delay-error-tradeoff for differ- 
ent coders, including Huffman, Shannon, and Lempel-Ziv. We 
find that the coder with the smallest average codeword length 
does not necessarily achieve the best delay performance. We 
prove that our model has the favorable property of additivity, 
permitting the independent analysis of sources and channels. 
We expect that our model enables further joint information- 
and queueing-theoretical investigations that have the potential 
to provide substantial new insights and applications from a 
holistic analysis of communications networks. 

The remainder of this paper is structured as follows. In 
Sec. In] we introduce envelope processes and develop the 
queueing model that we apply in Sec. Ill to characterize and 
analyze information sources and coders. In Sec. |lV]we show 
how to apply our model to analyze the transmission of coded 
sources via a Gilbert-Elliott channel and in Sec. [V] we discuss 



Similarly, statistical envelopes provide upper bounds for the 
arrivals. The arrivals have envelope E{t) S T with overflow 
profile e£;(cr) with cr > if for all i > it holds that 



related works. We provide brief conclusions in Sec. VI 



II. Envelopes and Performance Bounds 

In this section we introduce the concept of statistical en- 
velopes that are the basis of this work. We use the analyti- 
cal framework of the stochastic network calculus established 
in ||9|, p3| to compute statistical performance bounds of the 
type P [backlog > y] < e or P [delay > y] < e from envelopes. 
For a broader overview see, e.g., p7), p2). In Sec. |II-A| 
we develop our model of sources and channels and prove its 



additivity. In Sec. II-B we assemble a method for construction 



of statistical envelopes from results on exponentially bounded 
burstiness p2|, p9| and on envelopes |J9), p6). 



A. Legendre Transform Model 

We use a discrete time model t E Nq. Denote A{t) the 
cumulative arrivals at a system, i.e., the cumulative number 
of bits generated by a source in the interval [0,t]. By def- 
inition A{t) is a non-negative and non-decreasing random 
process. By convention A{0) = 0. We use shorthand notation 
A{T,t) — A{t) — A{t). Similarly, the cumulative departures 
from a system are denoted D{t). By definition A{t), D{t) E T 
where T = {] : j{t) > /(r) > Vi > r > 0, /(O) = 0}. 

The service guarantee of a system, e.g., a communications 
link, a channel, or an entire network, is expressed by a 
statistical service curve that provides a lower bound for the 
departures that may be violated with a defined probability. A 
system has service curve S{t) E T with deficit profile £5(0") 
with o- > if for all t > it holds that 

?\D{t)<A®S(t)-a\<es{o) (1) 

where ® is the min-plus convolution defined for i > as 

!®g(t):^ inf {/(r)+g(t-r)}. 

re[0,t] 



9{A{t) > A® E{t) + cr] < e_E(cr). 



(2) 



Using the definition of service curves and arrival envelopes, 
statistical backlog and delay bounds can be computed from the 
maximal vertical and horizontal deviation of E[t) and S(t), 
respectively. 

In this work we use the concave and convex Legendre 
transforms of E(i) and S(t) defined for c > as' 

Ze{c) :=sup{^(t)-ct}, 

i>0 

£s(c) := sup{ct - S'(i)} 
t>o 

to model sources and channels, respectively. Legendre trans- 
forms uniquely determine concave arrival envelopes and con- 
vex service curves and enjoy a number of useful properties 
in the network calculus | ,18j . The following Lem. [T] shows 
that backlog and delay bounds can be computed from Ce{c) 
and £g (c) by a simple addition. The property of additivity is 
particularly useful as it allows composing results obtained for 
sources E{t) and systems S(t) independently. Lem. [T] extends 
an earlier deterministic result for backlogs from flSfT 

Lemma 1 (Additivity of Legendre Transforms): Given a 
system with service curve S{t) and deficit profile es{cr) and 
arrivals with envelope E{t) and overflow profile ££;(cr). For 
any c > and ctb, (75 > it holds for the backlog B that 

P[B > £e{c) + £s(c) + crE + <Js] < eE{<JE) + ss{(Js) 
and assuming fcfs order it holds for the delay W that 

P[W > (Ce{c) + Cs{c) +aE + (Js)lc] < EEidE) + ss{(7s)- 

Letting cr = ctb + (ts we refer to e{a) — eE{<yE) +£s{<^s) as 
the probability of error that can be minimized for cr^ , 0-5 > 
as £(cr) = inf^^ +^3^^ {eg (cTij) + es{(Js)} = £_e «> esia). 

For the special case of a constant rate server with capacity 
c we have S{t) = ct with £(cr) = for cr > such that 
Cg (c) = 0. It follows from Lem. Ill that Ce (c) + ctb is a 
backlog bound with probability of error £e{o'e), i-e., Ce{c) 
has the intuitive interpretation of a backlog bound for arrivals 
with envelope E{t) at a constant rate server with capacity c. 
Similarly, Cg{c) is a backlog bound for constant rate arrivals 
with rate c at the system S{t). 

Proof: Given arrivals A{t) and departures D{t). The 
backlog of the system is B{t) — A{t) — D{t). By substitution 
of ^ for D{t) and ^ for A{t) it follows for any t > that 
P[B >b]<eE® £si<y) where b = supt>o{£;(t) - S{t)} + a 
lis) . We rewrite h = supt>o{^(t)-ci + ct-S'(t)} + cr where 
c > 0. It follows that 

h < sup{£;(t) - ct} + sup{ct - S{t)} + a 

t>0 t>0 

which completes the proof of the backlog bound. 

'The Legendre transform is also referred to as Fenchel conjugate |32| . 
Strictly speaking the concave conjugate is defined as inft>o{ct — E{t)\ = 
— supj>Q{i?(t) — ct}. We slightly adapt the definition for ease of exposition. 



The delay of the system is defined as the horizontal devi- 
ation W{t) = inf{r > : A(t) < D{t + t)}. As above, it 
follows for any i > that P[W^ > d] < Se ® ^sW) where 
d = inf{r > : supt>o{^(i) - S{t + t) + a} < 0}. We 
rewrite 

d = inf {t > : E{t)-c{t+d)+c{t+d)-S{t+T)+a<0;it > 0} 

where c > 0. We choose iS — sup(>o{-E(<) — ct}/c such that 
E(t) -c(t + -d) <0 for all t > and estimate 

d < infJT > : c(t + i9) - S'(f + t) + cr < 0,Vt > 0}. 

After some reordering 

d < inf {r > : c{t + t) - S{t + t) + a < c{t - i?), Vt > 0} 

we arrive at 

d < inf {r > : (ct - S{t) + a)/c + i? < r, Vi > 0}. 

It follows that r = ^ + sup(>g{ci — S{t)}/c + a jc and 

d < svLp{E{t) - ct}/c + supjci - S{t)}/c + a/c 

t>0 t>0 

completes the proof of the delay bound. ■ 

B. Construction of Envelopes 

We construct statistical envelopes as defined in (|2| from the 
moment generating function (MGF) of the arrivals. We assume 
stationary arrivals, i.e., P[A{t,t + 1) > y] = P[A{t) > y] for 
any y and all T,t > 0. The MGF of the arrivals is 

where 9 is a free parameter Closely related is the concept of 
effective bandwidths defined for 6* > as JS), ||24) 



aie,t) = -lnMA{0,t). 



(3) 



The effective bandwidth increases in 6* > from the mean 
rate of the arrivals in an interval of length t to their peak rate, 
providing an estimate of their capacity requirements. Given an 
aggregate of independent arrivals A{t) = Ai{t) + A2{t) the 
effective bandwidth a{9,t) ~ ai{9,t) + a2{9,t) is additive, 
since for the sum of independent random processes it holds 
thatMA{9,t) = MAA9,t)MA,{9,t). 

From Chernoff's theorem P[Y > y] < e'^yMYi9) for 9 > 
an upper bound on the arrivals follows as 



P[A{t) > F(<) +^] < e-«(^(*)+^)M^(0,i) = Ke~ 



■e? 



(4) 



where we chose to equate the right hand side with Ke *'" with 
parameters k S (0, 1] and <; > 0. We solve for F{t) and obtain 



F{t) = ta(9,t) -Inn/ 



(5) 



By construction F{t) is an envelope for A{t) that is violated 
at most with probability kc^^*" for any t > 0. It does, however, 
not satisfy the definition from (|2]i that requires a sample path 
argument for all t > 0. We rewrite (|2| as 

P[A{t) > A(g)E{t)+a] = P[3t : A(r,t) > E{t-T)+cr] (6) 



and obtain from the union bound that 

t-i 

P[A{t) > A (g) E{t) +cr]<Y^ P[A{t, t) > E{t - r) + cr] 

T = Q 

where we used that the addend at r = Hs zero since £'(0) + 
cr > and by definition A(t, t) — 0. 

We select E{t) = F{t) + St where F{t) is given in (|5| and 
(5 > is a free parameter. By substitution of <; = cr + Ji we 
obtain from @ that P[A{t) > E{t) + a] < Ke"''(''+'^*) and 
for A{t) stationary 

t-i 
P[A{t) > A® E{t) + <t] < Ke-''" Y^ e-^^^*-^\ 

r=0 

For any t > we estimate ^^~ ^ (.'''^(t-r) < ^oc^^ ^-bSt _ 
Since e~^^'^ is decreasing in r we can bound each summand 
by e"^'''" < /Jl^ e-'^^^dr to arrive at 



P[A{t) > A(E) E{t) + a]< Ke~ 



^-957 



dT^ 



Using the definition of envelope (|2]i we equate e£;(cr) 



Ke 



-Oct 



/{OS). Without loss of generality we choose ££(0) — 1 
and solve for k = 9S where S < 1/9 such that k < 1. By 
insertion of k into (|5]l we derive from E{t) = F{t) + St 
that E{t) = ia{0,t) + S)t - ln{dS)/e has overflow profile 
££;(c) = 6^^°^ and find the Legendre transform 

\n{es) 



Ce{c) = sup{(a(6i, t) + S- c)t} - 
t>o 



e 



(7) 



For a deterministic constant rate server with capacity c it 
holds that £5(0) = with deficit profile £s(cr) = for cr > 0. 
It follows from Lem. [T]that P[B > Ze{c) + a] < e-'^'^ , i.e., 
Ze{c) + cr is a backlog bound with exponentially decaying 
probability of error e — e^^"'. The parameters 9 > 
and S G (0, 1/9] can be optimized to minimize backlog, 
respectively, delay bounds. Given e we can solve e = e^^'^ 
for cr = — In e/9 and derive the minimal backlog bound 

b= mi\ Ce{c) — 

A minimal delay bound follows as 

Ce{c) Ine 



d = inf 

0>O 



(8) 



Remark on Related Envelope Models: Using the Legendre 
transform (|7]) formalizes a backlog bound that can also be 
derived from the exponentially bounded burstiness model |i39l 
P[A{t) > pt + a] < Ke""'^ for t > 0. By application of flie 
union bound as above a backlog bound for a constant rate 
server with capacity c is P[B > a] < ne^^'^ /{9S) where 
c^ p + S. Choosing k. = sup^yQ{MA{9,T)e-'^P^}, that is the 
optimal solution from Chernoff's theorem, the two backlog 
bounds can be converted into one another 

We note that a similar result can be obtained by approxi- 
mation of ^ by the largest term P[A{t) > ^ (g) E{t) + cr] « 
sup^gjo (i{P[^(r, i) > E{t — t) + a]} that strictly provides 
only a lower bound. Letting E{t) = F{t) from (|5]l where 
Cf{c) = sup(>Q{(a(6',t) — c)i} at k = 1 yields that £i?(c)+cr 






Fig. 1. Unified system model. A source generates symbols according to a 
defined random process. The symbols are encoded and transmitted as arrivals 
A{t) by a queueing network. The network departures D{t) are decoded and 
delivered to the sink. 



is a backlog bound that is violated approximately with e^^°". 
In comparison, (|7| trades the slack rate 6 to achieve a true 
upper bound. 

III. Source Models AND Source Coders 

In this section we investigate the performance of a net- 
worked information source. An example of a relevant system 
is shown in Fig. [T] where the symbols of a source are 
encoded and transmitted by a network. Our aim is to combine 
information- and queueing-theoretic aspects to identify achiev- 
able operating points within the capacity-delay-error-space of 
the joint system, i.e., given a network with service curve S{t), 
e.g., in the most simple case S{t) = ct, can the system achieve 
a delay bound d with probability of error of at most e? 

We specify the detailed system model below. Consider a 
random variable X that can take any of the values, also 
called symbols, Xi with probability pi. We also refer to X 
as the alphabet of the source and denote \X\ its cardinality. 
Information theory defines that if the event X ~ Xi occurs, it 
provides information 1(2;^) = — Idpj bit where Id denotes the 
logarithm dualis, i.e., with base 2. The expected information 
becomes Hx ■= — J2i Pi ^'^Pi that is defined as the entropy of 
X. We label successive symbols generated by a discrete source 
by n e N. The stochastic process X{n) has entropy rate H;^ = 
lim„_^oo H(X(1),X(2), . . .,X{n))/n, i.e., Hx is the entropy 
per symbol. For stationary processes the entropy rate equals 
Hx = lim„^o, H{X{n)\Xin-l),Xin-2), . . .,X{1)) fll). 

We assign a number of bits k to each symbol Xi and define 
function I to map Xi to Z,. Accordingly, L{n) ~ l{X(n)) 
defines a random process of bit lengths that are generated by 
the symbol process X{n). As L{n) is an increment process we 
obtain the cumulative arrival process as A{n) — X]"=i ^i'^)- 
We let A(0) = by definition. 

Shannon established the entropy of a source as a fundamen- 
tal limit for lossless data compression. To this end, a code 
maps symbols Xi to unique codewords of length /, where 
the compression gain is due to assigning short codewords 
to frequent symbols. If no codeword is a prefix of any 
other codeword, the code is referred to as a prefix code, 
where each codeword can be decoded on its own. For an 
optimal code the expected codeword length I = J^iPi^i i^ 
bounded in an interval of one bit width by the entropy as 

^x<~l<^x + l^- 

In the next sections we investigate the non-equilibrium 
behavior of memoryless as well as Markov sources and 
show examples for finite and infinite alphabets. Secondly, 
we analyze the performance of well-known coders, such as 
the Huffman coder. Shannon coder, and Lempel-Ziv coder 



Without loss of generality we restrict our investigation to 
binary codes. 

A. Memoryless Sources 

We start our investigation with the basic memoryless source 
where the symbols X{n) are independent and identically 
distributed (iid). From the memorylessness it follows that the 
entropy rate of the process equals the entropy of a single 
symbol, i.e., Hx = Hx- We use function I to assign a number 
of bits li to each symbol a;,;. By definition L{n) ~ l{X{n)) 
has categorial distribution with MGF 



M^Y.P^e'^ 



(9) 



For the cumulative arrival process A{n) = X]"=i ^i'^) it 
follows that MA{0,n) = (Ml(0))" is multinomial. Assuming 
a source that emits symbols at a constant rate of one symbol 
per timeslot we substitute n ~ t. We relax this assumption 
in Sec. III-E We equate li — —\dpi such that MA{d,t) is 



the MGF of the number of information bits of all symbols 
generated up to time t. From ^ we derive 



«w = ^in(Epr 



(10) 



that does not depend on t due to the memorylessness of the 
source. An upper envelope on the number of information bits 
generated by the source up to time t that is violated at most 
with probability k follows immediately from (|5]), where 6 > 
is a free parameter that can be optimized. 

The envelope provides a benchmark that can be interpreted 
as a statistical non-equilibrium bound on the number of bits 
generated by a (hypothetical) optimal coder that maps symbols 
Xi to codewords of lengths li = — Idp^. The coder is optimal 
in the sense that it's average codeword length equals the 
entropy of the source. In practice, this may not be achievable 
since — Idp^ typically is non-integer. For comparison, the 
Shannon code has li — [— Idp^]. 

Geometrically Distributed Symbols: Assume an infinite 
alphabet with geometrically distributed symbols pi = p(l— p)' 
for i > 0. The entropy rate follows by insertion and application 
of the geometric sum as 



H 



X 



p\dp+ (1 -p)ld(l -p) 
P 
Similarly, a{9) follows from ([TO]l for < < In 2 as 



a{e) = l\n 



P 



(11) 



We show respective envelopes from (|5]l for p = 0.25, 0.5, 
and 0.75 in Fig.l2] The corresponding entropy rates are Hx ~ 
3.25, 2, and 1.08 bit, respectively. The violation probability 
of the information envelopes is k = 10^^. We normalized 
the envelopes by the corresponding entropy, i.e., we plot 
F{t)/Hx- Accordingly, the black line with slope one is the ex- 
pected normalized information by time t. The non-equilibrium 
information envelopes show a significant deviation from the 
expected value. The non-linearity of the envelopes arises after 



200 



I 



-p = 0.25, H =3.25 

_n = 0.5, H =2 
' X 

. .p = 0.75, H„= 1.08 




I 

U 

o 4 

u 

> 

"3 

a 2 

I 
a 1 



40 


60 




80 




1 


time r 














- 


- P 


= 0.25 


H. 


= 3.25 




- 


— P 


= 0.5, 


"x 


_ 2 




- 


--/' 


= 0.75 


Hv 


= 1.08 




20 



40 60 

time t 



100 



Fig. 2. Information envelopes of a memoryless source with geometrically 
distributed symbols with parameter p. The envelopes show that the actual 
information rate can be significantly larger than the entropy rate (slope one in 
the top figure, respectively, horizontal lines in the bottom figure). It converges, 
however, quickly if longer time intervals are considered. 



minimization of (|5]l over > Q for any point in time t > 0. 
To see the convergence in equilibrium we also depict the 
increments of the envelopes, that have the interpretation of an 
information rate, as well as the respective entropy rates V\x- 
While the increments of the envelope deviate largely from the 
entropy on short time scales they converge quickly if longer 
time intervals are considered. 

B. Hujfman Coding 

Next, we consider envelopes for the number of bits gen- 
erated by a Huffman coder and derive performance bounds. 
To construct a Huffman code execute the following steps 
repeatedly until all symbols of the source have been processed: 

• sort the symbols in decreasing order of probability, 

• substitute the two least probable symbols by a new com- 
pound symbol, assign the sum of the two probabilities, 
and add one bit to the respective codewords to distinguish 
the individual symbols. 

The Huffman prefix code achieves the minimal expected 
codeword length, hence V\x < ^ < Hx + 1- Regarding 
the individual codeword lengths U, however, no such simple 
upper bound exists. In fact, it is shown in |i23J that individual 
codewords of a Huffman code can become as large as ap- 
proximately 1.44 times the information of the corresponding 



symbol, i.e., U < — 1.45 Id p^. Compared to the information 
envelope where li = —Idpi, e.g., Fig.|2J the actual codeword 
lengths of a Huffman coder may significantly increase the 
number of bits generated. 

We characterize source coders by their capacity-delay-error- 
tradeoff, i.e., by (c, d, e) where d = £e{c)/c — In e/(6'c) for 
any 6 > from (|8]l. Assuming a memoryless source we first 
obtain a{6) for the coder from ([9]) as 



a{9)^-\n(Y,P 



Jh 



Since a{d) does not depend on t the condition c > a{6) is 
sufficient to achieve finite Ce{c) from (|7]i. We choose the 
free parameter 5 G (0, 1/9] as S — c — a{9). It follows that 
Csic) = — \n{9S)/9 and we obtain from dSll that 



inf 



ln{9{c - a{9))e) 
9~c 



(12) 



is a delay bound with error probability e. 

The (c, d, e)-tradeoff expresses the capacity that is required 
to achieve a delay bound subject to a defined probability of 
error. The delays are due to the randomness that is introduced 
by variable codeword lengths. Depending on the amount of 
buffering in the network, the error can be a violation of the 
delay bound, or a loss of information due to buffer overflow. 
As an implementation option, the envelopes can be used to 
discard excess data, that can occur at most with probability 
e, proactively by the coder itself, such that the delay bound 
is not violated. In the limit 9 —i' 0, i.e., permitting arbitrarily 
large delays d —> 00 we recover that a capacity of c ^- I bit 
per timeslot suffices to transmit the symbols of the source with 
arbitrarily small probability of error e —> 0. 

Geometrically Distributed Symbols: As for Fig.[2]assume an 
infinite alphabet with geometrically distributed symbols pi — 
p(l — pY for z > 0. We let p = 1/2 to obtain a dyadic source 
where —\dpi — i + 1 is integer. The respective Huffman code 
uses codewords of lengths l.^ = i + 1 such that a{9) for the 
Huffman coder is identical to ( [TT| in this case. Given c and e 
we compute d as described above and optimize 9 e (0,ln(2)) 
numerically. The entropy rate of the source is Hx = 2 and 
the expected codeword length is / = 2. 

Fig. [3] depicts the (c, d, £)-tradeoff of the Huffman coded 
source. For c > 1 finite delay bounds can be computed, 
whereas the delay grows unbounded for c -^ I. Also, Fig. [sj 
shows the logarithmic growth of d for decaying e that is 
characteristic of the approach. 

C. Shannon Coding 

Shannon coding works as follows. Assume all symbols Xi 
are ordered in decreasing order of their probabilities, i.e., pi > 
Pi+i- Denote Fi = ^^^iPj the cumulative probability of all 
symbols Xj where j < i. The first [— \dpi] positions after the 
decimal point of the binary number Fi are the codeword of 
the symbol Xi. 

While the Huffman code is optimal with respect to the 
expected codeword length, certain codewords may exceed the 
information of the respective symbol significantly. In contrast. 




capacity c 



Fig. 3. Capacity-delay-error-tradeoff of a Huffman coded dyadic source with 
geometric symbol distribution and entropy rate H;^' = 2. The delay grows 
unbounded if the capacity approaches the expected codeword length 1 = 2. 



the Shannon code achieves codeword lengths li = \— Idpi] 
that deviate from the information of any symbol by less than 
one. While the Shannon code does not generally achieve 
the minimal expected codeword length it enjoys, however, a 
property referred to as competitive optimality, i.e., given a 
randomly selected symbol of a dyadic source, the codeword 
generated by the Shannon coder is more likely to be smaller 
than larger if compared to the codeword generated by any 
other coder fTT] . 

To compute the (c, d, £)-tradeoff we estimate the codeword 
lengths k = \- Id Pi] < 1 - Idpi. It follows that I <Hx + 1 
and from ^ Ml(6') < J2^P^('''^^~^'^^'^ such that for a 
memoryless source 



1 



a(0)<-ln^ 



,P^ 



1. 



(13) 



As in Sec. III-B we obtain d from ( [T2] i. Since we compute 
bounds, we can substitute a{d) by the upper bound ( [T3| ) to 
obtain a conservative estimate. 

Closely related to Shannon coding is Shannon-Fano-EUas 
coding that motivates arithmetic coding. The respective code- 
words are of lengths li = [— Idpi] + 1. Using the estimate 
li < 2 — Id Pi the solution follows as above. 

Impact of Codeword Lengths: Assume a source 
that has an alphabet of five symbols with probabilities 
(3/8,2/8, i/s, i/s, i/b) and V\x ~ 2.156. The codeword lengths 
of the Shannon code are (2, 2, 3, 3, 3) such that I — 2.375. For 
comparison, the codeword lengths of the Huffman code are 
(1, 2, 3, 4, 4) with I = 2.25. We compare the (c, d, e)-tradeoff 
of the two coders in Fig. |4] where we optimize the parameter 
9 numerically. We choose e — 10^® and omit showing further 
results since d decreases logarithmically with increasing e as 
before, see Fig. l3] 

Fig. |4] shows an advantage of the Huffman code compared 
to the Shannon code if c ^ 2.4, that is due to the fact that 
the Huffman code achieves the minimal expected codeword 
length, whereas the Shannon code does not. If c ^ 2.4, 
however, the Shannon code outperforms the Huffman code 
in terms of the delay due to the smaller variability of the 
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Fig. 4. (c, d, e)-tradeoff of the Shannon coder compared to the Huffman 
coder. Unlike the Huffman coder the Shannon coder does not achieve the 
minimal expected codeword length. Due to the more balanced codeword 
lengths the Shannon coder effectuates, however, a significantly smaller delay 
bound if sufficient capacity is available. 



codeword lengths. Since the maximum codeword length of 
the Shannon code is maxi{/i} = 3 we have rf = for c > 3. 
For the Huffman code we have max^l^i} = 4 such that d = 
for c > 4. 

D. Lempel-Ziv Coding 

The term Lempel-Ziv coding refers to dictionary-based 
codes that encode a symbol or a sequence of s symbols 
by a reference to a previous occurrence. Compared to the 
codes above, the advantage of Lempel-Ziv coding is that it 
adapts to the source without a priori knowledge of the symbol 
distribution. Moreover, it is asymptotically optimal |38|. Here, 
we consider window-based Lempel-Ziv coding where the 
window contains the past w symbols. If the current symbol 
is found in the history, it is replaced by a pointer to the latest 
occurrence. This coder is proven to be optimal if the sequence 
length s and the history w tend to infinity pTj . 

We consider practical implementations with finite w and 
transmit symbols that cannot be found in the history uncom- 
pressed. Assuming a memoryless source X with symbols Xi 
that occur each with probability pi, the recurrence time k of 
symbol Xi is geometrically distributed 



fi{k) =Pj(l-Pi) 



fe-i 



To encode the pointer we use Elias-delta coding that encodes 
positive integers j > 1, see |33|. We use the codeword for j = 
1 that has a length of one bit as a prefix to mark uncompressed 
symbols and j > 2 to denote the k — {j — l)th most recent 
symbol in the window. The length of the codewords is p3) 



l{k) ^ [\d{k + 1)J + 2[ld(ld(fc + 1) + 1)J 
and a{6) follows from the definition ^ as 



1 



a{9) ^ lln(Y,pJJ2P^{l 



•.k-im(k) 



fc=i 



+ (l-p,^e''(rld|^^+K0))^y (14) 



The first part of the sum originates from encoding the pointer 
if the symbol is found in the history. The second part denotes 
the probabihty that the symbol does not occur in the history 
such that the symbol is sent without compression, requiring 
[Id \X\~\ bit to encode the symbol where \X\ is the cardinality 
of X plus ^(0) — 1 bit to mark the symbol as uncompressed. In 
the sequel we limit the maximum length of encoded pointers 
to \\d\X\'\ such that w == max{A: : l{k) < \ld\X\'\}. 

By insertion of l{k) ([T4]i becomes a sum of polylogarithms 
such that we cannot provide an analytical solution. For numer- 
ical evaluation it is useful to decompose the inner sum to solve 

for ki = 2^ - 1, fc„ = 2y+^ - 2 and any y > 1. As in 
Sec. [ilTB] we let (5 = c - a{e), require S G (0, 1/61], and 
obtain d from ( [T2] i. 

Impact of the Window Size: Assume a source has an 
alphabet of 256 symbols, i.e., an uncompressed symbol uses 
8 bit. An overall of 240 of the symbols occur each with 
probability 1/2048 and the remaining 16 symbols each with 
probability ii3/2048. The source generates one symbol per 
timeslot. The encoder groups s consecutive symbols, causing 
an additional delay of s — 1 timeslots, to generate super- 
symbols with cardinality \X\ = 2^". It executes the above 
algorithm on one of these supersymbols every s timeslots, 
i.e., the encoder periodically generates a codeword for s 
symbols every s timeslots. Using the periodicity we can write 
a{9,t) = [t/s] In Mi/(6't), where for a single increment 
In Ml/0 equals ( [T4] i. To compute the delay we choose pa- 
rameter 5 G (0, 1/9] as (5 = c - a{9, l)/s. It follows that 
supj>Q{(a(0,i) + (5 — c)i} — a[9,l)[s—\)/s and by insertion 
into Q and ([8]) a delay bound is 



TABLE I 

Parameters of the Lempel-Ziv coder. 



inf 
0>o 



a{9,l){s-l) \n{9{c-a{9,l)/s)e) 



Fig. I5] depicts the performance of the Lempel-Ziv encoder 
for different parameters s, see Tab. IT] The maximum pointer 
length is limited to 8s bit. The Elias-delta coding of the 
pointer becomes more efficient with increasing s such that the 
window size w that can be addressed increases significantly. 
Accordingly, the probability phit that a random sequence of s 
symbols is found in w increases. Note that since the algorithm 
operates on sequences of s symbols, the unit of w are s 
symbols, too. Finally, the normalized average codeword length 
1/ s shows the achievable compression gain. For comparison 
the entropy rate of the source is Hx ~ 4.98 and the average 
codeword length that is achieved by the Huffman coder is 
I « 5.03 requiring, however, a priori knowledge of the symbol 
distribution. 

Moreover, Fig. Blshows the (c, d, e)-tradeoff of the Lempel- 
Ziv coder for different s compared to the Huffman coder. 
The capacity requirements of the Lempel-Ziv coder improve 
with increasing s, respectively, increasing window size and 
approach the entropy eventually. The encoding of sequences 
of s symbols introduces, however, an additional delay at the 
encoder. Beyond that, it makes the encoded sequence more 
bursty, i.e., the encoder emits a codeword for s symbols every 
s timeslots, which causes further delay. 
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Fig. 5. Lempel-Ziv coding with different window sizes w, compared to 
Huffman coding. The entropy rate of the source is Hx ~ 4.98. With 
increasing window size the Lempel-Ziv coder eventually approaches the 
entropy. 



E. Variable Symbol Rate 

So far, we assumed that sources generate symbols at a 
constant rate. Next, we show how sources with a variable 
symbol rate can be modeled using conditional MGFs and 
analyzed by unconditioning. Given a memoryless source and 
denote Ml{9) the MGF of the increments (|9]l. The conditional 
MGF of 71 arrivals becomes MAi9,n) = (Ml(6'))". Here, the 
count of arrivals N{t) is a random process with probability 
mass function pN{n,t). The MGF M^(0,i) of the arrival 
process A{t) follows by unconditioning such that 

_. 00 

a{9,t) = -ln^(Mi(0))>jv(n,O- (15) 



n=0 



Poisson Process: A Poisson process with mean rate A has 
PN{n,t) — e~^*(Ai)"/n!. By insertion into ( fTSJ ) it follows 
that 

aA{9) = -^{ML{9)-r) 

where we used that X^^o*^"/"-' = 6°. Since a{9) does not 
depend on t, delay bounds follow immediately from ( [T2] l. 

We show an example for a source that generates eight 
different symbols with geometrically decreasing probability 
Pi = 1/2' for 1 < i < 7 and ps — p-j such that 
TliiPi ^ 1- Since the source is dyadic the codeword lengths 
of the corresponding Huffman code (as well as the Shannon 
code) are U ~ — Idpi bit. The entropy rate as well as the 
average codeword length are H^t" = ^ ~ 2 bit. The MGF of the 
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Fig. 6. Huffman coded Poisson source compared to an uncoded Poisson 
source, a hypothetical Poisson source with constant, entropy-sized codewords, 
and a Huffman coded constant rate source. The doubly randomness of the 
Huffmann coded Poisson source causes noticeable delays. Compared to an 
uncoded Poisson source, the Huffman coder achieves, however, a significant 
improvement. 



increments Ml{6) follows from (|9]l. Fig. l6lshows the (c, d, e)- 
tradeoff from ^V2\ for the Huffman coded Poisson source. 
For comparison with this doubly random process, we show 
results for a Huffman coded constant rate source as well as a 
hypothetical Poisson source with constant length codewords 
of length V\x bit- The average symbol rate of all sources 
is A = 1. Clearly, the Huffman coded constant rate source 
achieves zero delay if c > 7 since the codewords have at 
most seven bit length, whereas in case of the Poisson arrival 
process no such limit exists since an arbitrarily large number 
of symbols may arrive within a single timeslot. Finally, results 
for an uncoded Poisson source where each symbol is encoded 
using three bit are shown to depict the compression gain of 
the Huffman coder. 

F. Markov Sources 

In the following we relax the assumption of memoryless 
sources and consider discrete, stationary Markov sources, i.e., 
random processes X{n) with first order dependence where the 
symbol Xi that occurs in step n depends only on the previous 
symbol Xj in step n—\. The symbol Xi is also referred to as the 
state of the Markov chain that can take any of the values i = 
1,2, ... ,m. An example of a two state Markov chain is shown 
in Fig. |7] We denote pi the stationary state distribution of the 
chain and qij the transition probabilities from state i to state 
j. Define P to be the row vector {pi,p2, . . . ,Pm) and Q to 
be the state transition matrix. The stationary state distribution 
is the solution of P = PQ under the normalization condition 
PI = 1 where 1 is a column vector of ones. 

Due to the first order dependence the entropy rate of a 
Markov source becomes V\x — V\{X{n)\X{n — 1)) pTj and 
using the notation above V\x = —J2iJ2jPi1ij^'^1ij- Next, 
we compute information envelopes for Markov sources. The 
MGF of a discrete Markov chain that produces a constant 
amount of data l.j if it is in state i is known from fSl. Let 
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Fig. 7. Example two-state Markov chain. 



at a constant rate of one symbol per timeslot. The effective 
bandwidth of the Markov chain f or i > 1 is known as 



,t) = ^^\n{P{L{0)QY-'L{e)l) 



(16) 



Regarding ( [T6| ), we can, however, not substitute k by the 
amount of information generated in state i since the informa- 
tion provided by symbol Xi depends on the previous symbol 
Xj, i.e. each symbol has conditional information l(a;i|a;j) = 

— Id Qji bit. Overall, for a Markov chain with m states we can 
distinguish m^ distinct pairs of successive symbols. 

To solve the problem posed by the conditional information 
we extend the state space from m to rn^ states. We denote the 
states i\j, respectively, Xi\xj meaning that symbol Xi occurred 
in the current timeslot after symbol Xj occurred in the previous 
timeslot. Due to this expansion the information generated by a 
single symbol in any state of the chain is uniquely determined 
by the state itself, i.e., the information generated by symbol 
Xi in state i\j is \{xi\xj) = — Id^j^. The transition probability 
from state j\k to state i\j is qji for any i,j,k and zero 
otherwise. Fig. l8] shows the accordingly extended Markov 
model for the example from Fig. |7] Given the transition matrix 
of the extended Markov model we compute the stationary 
state distribution and let l^j ~ —Idqji to compute a{9,t) 
from ([T6|. An information envelope follows from (|5]l. 

Two-state Markov Source: We show an example for a two 
state Markov source as depicted in Fig. |7] The stationary 
state distribution follows from the balance equations as pi = 
<?2i/(gi2 + q2i) and p2 = qi2/{qi2 + q2i)- As a measure of 
the burstiness of the source we use the average time to change 
state twice T = l/qi2 + 1/921- We choose pi = s/s and 
P2 = V® and use different burstiness parameters T w 4.3, 
T = 8, and T = 16. The corresponding state transition 
matrices Q = (911,912; 921, 922) are Q = (5/8,3/8; Vs- Vs). 
i.e. the source is memoryless, Q — (4/5, 1/5; 1/3, 2/3), and 
Q — (9/10, Vio; Vs, Vs), respectively. 

The entropy of a single symbol follows as Hx — 

— ^^Pjldpi « 0.95 bit and the entropy rate H/t" = 
H{Xin)\X{n~l)) = - E, E, K9.j ld%, « 0.95, Hx « 
0.80, and Hx ~ 0.54 bit, respectively. We use the extended 
model in Fig. IS] that has the stationary state distribution 




L be the diagonal matrix diag(e^'i , e^ 



e "). As before 



we substitute n = t assuming a source that emits symbols 



Fig. 8. Extended Markov model for the example from Fig. IT] where the 
information generated by symbol Xi given the previous symbol was Xj is 
uniquely determined by the state Xi\xj itself. 
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Fig. 9. Increments of the information envelopes of a two-state Marlcov source 
with different burstiness parameters T, where T ^ 4.3 corresponds to a mem- 
oryless source. The upper figure zooms into the lower one. While increasing 
memory T reduces the entropy rate H;v^ it causes a slower convergence of 
the information envelopes, i.e., the source can deviate significantly from its 
expected infomiation rate with non-negligible probability. 



Pl\l = Piqil, Pl\2 = P2<721, P2|l = Piqi2, and P2|2 = P2922- 

We compute a{9, t) from ( [T6] l and information envelopes F{t) 
from (|5]l which we minimize for > 0. 

Fig. [9] shows the increments of envelopes F{t) with viola- 
tion probability k — 10^^. For small i ^ 10 the envelopes 
are determined by the worst-case, i.e., the maximal amount 
of information that can be emitted by the Markov source. For 
parameter T = 8 the occurrence of symbol X2 after symbol 
xi has the largest information l(a;2|a;i) = — ldgi2 ~ 2.32 
bit followed by the occurrence of symbol xi after symbol X2 
with l(a::i|a::2) = —\dq21 ~ 1.58 bit. Since direct transitions 
from state X2|xi to state X2|a;i are not possible, the maximal 
information is achieved by a sequence of alternating xi and 
X2 causing the zigzags in between l(a;2|a;i) and l(a;i|a;2) for 
small t. The same argument applies for T — 16. In contrast, for 
T « 4.3 the source is memoryless such that the information 
l(a;2|a;i) = -ldgi2 equals K^ajxa) = -\Aq22 « 1.42 bit, 
i.e., the maximum information is achieved by a sequence of 
all X2 such that zigzags do not occur 

Due to statistical effects for i ^ 10 the worst-case occurs 
with probability less than k = 10^^ such that it does not 
dominate the envelopes that approach the entropy rate for 
large t. While increasing memory T reduces the entropy 
rate it causes, however, a significantly slower convergence 




Fig. 10. Extended Markov model for the example from Fig. IT] where states 
correspond to the occun'ence of supersymbols that are sequences of s symbols, 
here s = 2. 



of the envelope. This is due to unfavorable, high-information 
sequences of symbols that are not excluded from the envelope 
by the violation probability k. 

G. Coding Markov Sources 

For our investigation of coded Markov sources we assign 
to each symbol Xi a codeword of length U without requiring 
further assumptions about the coder used. We compute a{6,t) 
of a coded Markov source from ( fTSj l. To compute a delay 
bound from ([8]) we require that Ce{c) from ^ is finite. Since 
a{9,t) increases in t it has to hold that c > a{6,t) for all 
t > 0. We choose the free parameter 5 £ (0, 1/9] as 6 = 
c — sup(>o{a(6',i)}. It follows that Ce{c) = —\n{9S)/9 and 
a delay bound with error probability e is 

lii(0(c-sup,>o{a(0,i)})e) 



inf 



9c 



(17) 



The compression gain of such a straightforward encoding 
of a Markov source is, however, limited by the entropy of 
a single symbol Hx since the memory of the source is not 
utilized. To achieve further compression down to the entropy 
rate Hx the coder has to be adapted. One approach is to 
encode sequences of s symbols instead of single symbols. In 
this case the average normalized codeword length is limited 
by H{X{l),X{2),...,X{s))/s which approaches Hx for 
s — > 00. Given a Markov source with m distinct symbols, 
respectively, states. If we group s subsequent symbols we 
can distinguish m** supersymbols. To model such groups of 
symbols we extend the state space of the Markov chain to 
to" states, accordingly. Fig. [TO] shows the extended model for 
the Markov chain from Fig. IT] for s — 2. Here, states Xi,Xj, 
respectively, i, j denote the group of symbol Xj followed by 
symbol Xi. Hence, the state transition probabilities from state 
fc,y to state i,i are QkjQji for any i,j,k,y. We assign a 
unique codeword to each of the to'' groups of s symbols and 
use the codeword lengths to determine the diagonal matrix 
L. For a coder that encodes a group of s symbols every s 
timeslots a{9,t) follows from the extended Markov model 
for i > 1 as a{9,t) = ln(P(L(6l)Q)r*/'*l-iL(6l)l)/(6'i). To 
obtain the delay bound from ([8]l we choose the free parameter 
5 e (0, 1/9] as S — c — supoojal^*, st)} and compute Ce{c) 
from (|7|. Grouping s symbols adds an additional delay of s— 1 
timeslots. 

Two-State Markov Source: As an example we employ the 
two-state Markov source as shown in Fig. IT] with transition 
matrix Q — (9/io, ^/w; i/e, s/e) and encode groups of s 
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TABLE II 
Parameters of the Huffman coder. 



s 


HiX(l),...,X{s))/s 


[bit] 


J/s [bit] 


1 


0.954 




1. 000 


2 


0.746 




0.781 


3 


0.676 




0.682 


4 


0.641 




0.651 


5 


0.620 




0.630 


6 


0.607 




0.618 


7 


0.600 




0.602 


8 


0.589 




0.593 



Example for State Dependent Codes: Consider a three- 
state Markov source with transition matrix Q = (1/2, 1/4, 1/4; 
1/4, 1/2, 1/4; 1/4. 1/4, 1/2). We construct an extended nine-state 
Markov model, as in Sec. |III-F[ where the code used in state 
i\j to encode symbol Xi is conditioned on the last symbol Xj. 
Accordingly, if the last symbol was xi, the optimal codeword 
lengths are /i|i = 1 bit, l2\i =2 bit, and /311 —2 bit, whereas 
1^2 = 2 bit, Z212 = 1 bit, and /312 = 2 bit apply if the last 
symbol was X2, and 1^:^ ~ 2 bit, /213 — 2 bit, and Z313 — 1 bit 
if the last symbol was X3. 
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Fig. 11. Huffman coded Markov source. Due to the memory the normalized 
entropy of groups of s symbols decreases with increasing s. The average 
codeword length of the Huffman code approaches the entropy rate with 
increasing s resulting, however, in delays due to the variability of the 
codeword lengths and due to the grouping of symbols. 



symbols using a Huffman coder Tab. [Ill shows '^^e entropy and 
the average codeword length normalized by s for s = 1, . . . , 8. 
Clearly, the entropy decreases with increasing s and for 
s ^ 00 we find the entropy rate H;^ ~ 0.537. As Tab. lU] 
confirms, the Huffman encoding of groups of symbols can 
approach the entropy rate quite well, however, at the cost 
of delays. In Fig. 

The delays are due to the 



11 we show a delay bound subject to an 
error probability of e = 10^®. 
variability of the codeword lengths and due to the grouping of 
s symbols which causes an additional delay of s — 1 timeslots. 
Moreover, the grouping makes the encoded sequence more 
bursty. Depending on c different values of s are optimal, e.g., 
if c > 1 the delay is minimized for s = 1 whereas for 
smaller c larger s are advantageous. Certain parameters s, i.e., 
s = 3, . . . , 6 marked by dotted lines, are outperformed for all 
c. This effect is caused by the individual Huffman codes for 
each s that are more or less efficient. 

As an alternative to the grouping of symbols as described 
above, Markov sources can be encoded efficiently using in- 
dividual codes for each of the states, i.e., the last symbol 
determines the code that is used to encode the next symbol. To 
model an encoder that chooses the code depending on the last 



symbol we extend the Markov model as described in Sec. III-F 



e.g.. Fig. [SI for a two-state Markov chain. We denote liy the 
length of the codeword that is used for symbol Xi given the 
last symbol is Xj. Using the extended model a{6,t) follows 
from (T6\ . A delay bound can be computed from ( [T7| ). 



IV. Transmission via a Gilbert-Elliott Channel 



In this section, we show how our results on source coding 



from Sec. Ill can be composed with channel models, such 
as the Gilbert-Elliott channel. Key to this composition is the 
additivity established by Lem. [T] To this end, we require a 
service curve model of the channel. 

Service curves of wireless channels have been derived, 
e.g., in g, IT6), |[22), |[28|, (36). For ease of exposition, 
we resort to the impairment model from p2) . The model 
assumes a work-conserving channel, e.g., with peak rate R, 
that is impaired by a stationary random process /(r, t). Given 
I{T,t) has envelope E{t) with overflow profile e_E(o') Q the 
channel has service curve S{t) — Rt—E{t) with deficit profile 

We assume a two-state Gilbert-Elliott channel that is either 
in good state, i.e., data are transmitted error-free with rate 
R, or in bad state, i.e., data cannot be decoded and are lost. 
The transition probabilities between the two states are first 
order dependent, i.e., the model is a Markov chain. Using the 
impairment model, the corresponding impairment process is a 
two-state Markov chain and has rate zero in state 1 (good) and 
rate R in state 2 (bad) [16|, i.e., it consumes no or all available 
resources, respectively. The effective bandwidth a{9,t) of the 
impairment process is given by (T6\ and an envelope follows 
as E{t) = {a{0,t) + S)t-ln{eS)/9 with es(cr) = e""'" where 
6* > and (S e (0, 1/0], see Sec. ^TB^ . 

Putting all pieces together, we compute S{t) and obtain the 
delay bound (£5(0) + as)/c with error probability £5(0") = 
g-ecrs fQj. arrivals with constant rate c, where 



Cg{c) = sup{(c + a{e, t) + 5- R)t} 
t>o 



iTi{es) 



As before, we let 0-5 = — lnes/6' and choose S E (0, 1/6] as 
5 = R-c- supt>o{a(6', t)} such that £s(c) = - lii{9d)/e 
and a delay bound with error probabiUty £5 is 



d ~ inf 

0>O 



-\n{0{R-c-snp,^o{a{e,t)})es) 
9c 



A delay bound for variable rate arrivals from a source 
coder follows by a simple addition of the respective Legendre 
transforms, i.e., fTomLem.\\\{CE{c)+Cg{c)+aE + crs)/cis 
a delay bound for the composed systems with error probability 

£e{(^e) +£s(o-s)- 
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Fig. 12. Transmission of a Huffman coded source via a Gilbert-Elliott 
channel. The average codeword length of the source is 2 and the average rate 
of the channel is 4. The individual curves show the delay bound obtained for 
the Gilbert-Elliott channel given constant rate arrivals with rate c, respectively, 
obtained for the Huffman coded source given a channel with constant service 
rate c. The delay bound for the composite system is obtained from Lem. [T] 
by taking the minimum of the sum of the two curves, i.e., 53 timeslots. 



Transmission of a Huffman Coded Source: We consider 
transmitting the source from Fig. [3] via a Gilbert-Elliott chan- 
nel with peak rate R — 6 and two-state Markov impairment 
process with generator matrix Q — (^/s, i/s; 1/4, 3/4). The state 
probabilities in equilibrium are P = (2/3, 1/3) and the average 



rate of the channel is 4. Fig. 12 shows the individual capacity- 



delay-error-tradeoffs of the source coder and the channel, each 
with probability of error Ee ^ £s = 10~^. Moreover, we 
show the sum of the two curves that is a delay bound for 
the composite system consisting of the Huffman source coder 
and the Gilbert-Elliott channel for any c > 0. While c has the 
interpretation of a constant arrival rate, respectively, constant 
service rate if we consider Cgic) and Ce{c) in isolation, it 
does not have such physical meaning for the composite system, 
where £5(0) + Ce{c) can be minimized over c > 0. The 
minimal delay bound of the composite system follows as 53 
timeslots with probability of error e = 2 • 10~^. 

V. Related Work 

Neglecting the variability and delay sensitivity of real 
sources, information theory has not become widely accepted 
in networking so far, see [14| for an excellent survey and 
a discussion of the gap between respective theories. Re- 
cently, |3| proposes non-equilibrium information theory as a 
new paradigm and highlights the potentialities, difficulties, and 
possible approaches. The authors envision a characterization 
of mobile ad-hoc networks by "throughput-delay-reliability- 
triplets." In this paper we derived a feasible implementation 
and provided respective models for source coders and channels 
that complement the vision. 

The variability of fading channels is considered already 
in p9) where a notion of outage capacity is defined. The 
outage capacity models the probability of errors that occur 
when the transmission rate is larger than the instantaneous 
capacity of the channel. A related concept, the delay-limited 
capacity pTj, compensates fluctuations of the fading process 



using power control to achieve a constant transmission rate. 
Subsequent works use related concepts to implement power 
control subject to additional buffering constraints [51, [27 J . 
Recently, the impact of finite blocklength codes on the vari- 
ability of the channel is investigated, e.g., in p|, |30|, |31 1. 

While the definition of outage capacity does not contain 
any queueing-theoretic aspects, it can be incorporated into a 
queueing analysis, as shown in |1| using the M|G|1 model. 
Markovian queues have also been parameterized to model 
fading channels in ||6), ||7l. While |J6| models a block fading 
process by a variable rate server that is governed by an 
embedded Markov chain, [7] views fading outages as an 
impairment process that is modeled by high priority customers 
at an M|G|1 priority queue. The concept of an impairment 
process was also introduced to the stochastic network calculus 
to analyze outages of wireless channels p2| . Similar to the 
concept of effective bandwidth |'36l develops an effective 
capacity model to analyze delays due to fading. Multi-access 
channels are modeled in p4) as a processor sharing queueing 
system whose capacity is adapted according to the interference 
created by active stations. 

Regarding traffic sources, networking research frequently 
assumes certain stochastic processes or employs traffic traces. 
In 1 20 1 it is shown how the effective bandwidth of traces, 
e.g., for MPEG video, can be computed and in p5| empirical 
envelopes for variable bit rate traffic are derived. The models 
facilitate performance analysis of networks using respective 
queueing models. Information theoretic concepts itself are, 
however, not used. Recent papers fTO\, fT9l provide a frame- 
work that includes network elements that process and re-scale 
data into the analysis. In this work, we model the compression 
of data by source coders, which complements the approach. 

A calculus for so-called information-driven networks is 
introduced in p7| , where the focus is on information instead 
of data traffic. To this end, the entropy function is employed 
to convert the data of a flow A{t) to its expected information 
H{A{t)). By substitution of H{A{t)) for A{t) the framework 
of the network calculus is used to compute redefined metrics 
such as the information backlog and the information delay. 
Compared to p7) , in this work we did not define envelopes for 
the expected information of a source. Instead we derived en- 
velopes for the actual amount of bits generated by memoryless 
as well as Markov sources and for different implementations 
of source coders. 

VI. Conclusion 

In this paper, we investigated a statistical envelope-based 
approach towards a non-equilibrium information theory. We 
applied Legendre transforms to characterize sources and sys- 
tems by their achievable capacity-delay-error-tradeoff. The 
additivity of the model facilitates a separability of sources 
and systems that is comparable to the separation of entropy 
and channel capacity in information theory. In addition to 
the average behavior, statistical envelopes and their Legendre 
transforms consider non-negligible deviations that can cause 
significant network latencies. If arbitrary delays are permit- 
ted, our model recovers the entropy, respectively, average 
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codeword length in the Umit. We provided information en- 
velopes for memoryless as well as Markov sources, where we 
show how the memory increases the variability. We derived 
the capacity-delay-error-tradeoff of Huffman, Shannon, and 
Lempel-Ziv coders as well as for Gilbert-Elliott channels. 
Our models are applicable in the frameworks of the theory 
of effective bandwidths and the stochastic network calculus 
enabling joint information- and queueing-theoretic cross-layer 
research. 

References 

[1] N. Ahmed and R. G. Baraniuk. Throughput measures for delay- 
constrained communications in fading channels. In Proc. Allerton 

Conference on Communication, Control and Computing, 2003. 
[2] S. Akin and M. . C. Gursoy. Effective capacity analysis of cognitive 

radio channels for quality of service provisioning. IEEE Trans. Wireless 

Commun., 9(ll):3354-3364, Nov. 2010. 
[3] J. Andrews, S. Shakkottai, R. Heath, N. Jindal, M. Haenggi, R. Berry, 

D. Guo, M. Neely, S. Weber, S. Jafar, and A. Yener Rethinking 

information theory for mobile ad hoc networks. IEEE Commun. Mag., 

46(12):94-101, 2008. 
[4] D. Baron, M. A. Khojastepour, and R. G. Baraniuk. How quickly can we 

approach channel capacity? In Proc. Asilomar Conference on Signals, 

Systems, and Computers, Nov. 2004. 
[5] R. A. Berry and R. G. Gallager. Communication over fading channels 

with delay constraints. IEEE Trans. Inf Theory, 48(5): 1135-1149, 2002. 
[6] I. Bettesh and S. Shamai. Queuing analysis of the single user fading 

channel. In Proc. IEEE Convention of the Electrical and Electronic 

Engineers in Israel, 2000. 
[7] J. Burdin and R. Landry. Delay analysis of wireless Nakagami fading 

channels. In Proc. IEEE Globecom, 2008. 
[8] C.-S. Chang. Performance Guarantees in Communication Networks. 

Springer- Verlag, 2000. 
[9] F. Ciucu, A. Burchard, and J. Liebeherr. Scaling properties of statistical 

end-to-end bounds in the network calculus. IEEE/ACM Trans. Netw., 

14(6):2300-2312, 2006. 
[10] F. Ciucu, J. B. Schmitt, and H. Wang. On expressing networks with 

flow transformations in convolution-form. In Proc. IEEE INFOCOM, 

Apr. 2011. 
[11] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley- 

Interscience, second edition, 2006. 
[12] R. L. Cruz. A calculus for network delay, part I and II: Network elements 

in isolation and network analysis. IEEE Trans. Inf. Theory, 37(1):114- 

141, 1991. 
[13] R. L. Cruz. Quality of service management in Integrated Services 

networks. In Proc. Semi-Annual Research Review, Center of Wireless 

Communication, UCSD, June 1996. 
[14] A. Ephremides and B. Hajek. Information theory and communica- 
tions networks: An unconsummated union. IEEE Trans. Inf. Theory, 

44(6):2416-2434, 1998. 
[15] M. Fidler. An end-to-end probabilistic network calculus with moment 

generating functions. In Quality of Service, 2006. IWQoS 2006. I4th 

IEEE International Workshop on, pages 261 -270, 2006. 
[16] M. Fidler. A network calculus approach to probabilistic quality of 

service analysis of fading channels. In Proc. of IEEE Globecom, 2006. 



[17] M. Fidler. Survey of deterministic and stochastic service curve models 

in the network calculus. Communications Surveys Tutorials, IEEE, 

I2(l):59 -86, 2010. 
[18] M. Fidler and S. Recker. Conjugate network calculus: A dual approach 

applying the Legendre transform. Computer Networks, 50(8): 1026- 

1039, 2006. 
[19] M. Fidler and J. B. Schmitt. On the way to a distributed systems 

calculus: an end-to-end network calculus with data scaling. In Proc. 

of ACM SIGMETRICS/Performance, pages 287-298, 2006. 
[20] R. J. Gibbens. Traffic characterisation and effective bandwidths for 

broadband network traces. In Stochastic Networks: Theory and Ap- 
plications, number 4 in Royal Statistical Society Lecture Notes, pages 

169-179. Oxford University Press, 1996. 
[21] S. V. Hanly and D. N. Tse. Multiaccess fading channels-part ii: Delay- 
limited capacities. IEEE Trans. Inf Theory, 44(7):2816-2831, 1998. 
[22] Y. Jiang and Y. Liu. Stochastic Network Calculus. Springer- Verlag, 

2008. 
[23] G. Katana and O. Nemetz. Huffman codes and self-information. IEEE 

Trans. Inf Theory, 22(3):337-340, 1976. 
[24] F. R Kelly. Notes on effective bandwidths. In Stochastic Networks: 

Theory and Applications, number 4 in Royal Statistical Society Lecture 

Notes, pages 141-168. Oxford University Press, 1996. 
[25] J.-Y Le Boudec and P. Thiran. Network Calculus A Theory of 

Deterministic Queuing Systems for the Internet. Springer- Verlag, 2001. 
[26] C. Li, A. Burchard, and J. Liebeherr. A network calculus with effective 

bandwidth. IEEE/ACM Trans. Netw., 15(6): 1442-1453, 2007. 
[27] X. Li, X. Dong, and D. Wu. Queue length aware power control 

for delay-constrained communication over fading channels. Wireless 

Communications and Mobile Computing. To appear 
[28] K. Mahmood, A. Rizk, and Y Jiang. On the flow-level delay of a spatial 

multiplexing mimo wireless channel. In Proc. IEEE ICC, June 2011. 
[29] L. H. Ozarow, S. Shamai, and A. D. Wyner. Information theoretic 

considerations for cellular mobile radio. IEEE Trans. Veh. Technol., 

43(2):359-378, 1994. 
[30] Y Polyanskiy, H. V. Poor, and S. Verdu. Dispersion of the Gilbert-Elliott 

channel. In Proc. IEEE ISIT, June 2009. 
[31] Y Polyanskiy, H. V. Poor, and S. Verdu. Channel coding rate in the 

finite blocklength regime. IEEE Trans. Inf. Theory, 56(5):2307-2359, 

May 2010. 
[32] R. T. Rockafellar Convex Analysis. Princeton University Press, 1972. 
[33] D. Salomon. Variable-length codes for data compression. Springer- 

Verlag, 2007. 
[34] E. Telatar and R. B. Gallager Combining queueing theory with 

information theory for multiaccess. IEEE J. Sel. Areas Commun., 

13(6):963-969, 1995. 
[35] D. E. Wrege, E. W. Knightly, H. Zhang, and J. Liebeherr. Deterministic 

delay bounds for VBR video in packet-switching networks: Fundamental 

limits and practical trade-offs. IEEE/ACM Trans. Netw., 4(3):352-362, 

1996. 
[36] D. Wu and R. Negi. Effective capacity: A wireless link model for support 

of quality of service. IEEE Trans. Wireless Com., 2(4):630-643, 2003. 
[37] K. Wu, Y Jiang, and G. Hu. A calculus for information-driven networks. 

In Proc. IEEE IWQoS, July 2009. 
[38] A. Wyner and J. Ziv. The sliding-window lempel-ziv algorithm is 

asymptotically optimal. Proc. IEEE, 82(6):872-877, 1994. 
[39] O. Yaron and M. Sidi. Performance and stability of communication 

networks via robust exponential bounds. IEEE/ACM Trans. Netw., 

l(3):372-385, 1993. 



